Speaker normalization for audio-visual articulation training
نویسندگان
چکیده
The paper describes formant based speaker normalization method suitable for speech visualization and articulation training systems. The method estimates the error function obtained from speaker formant characteristics for a given vowel. Estimated error function gives information for critical band filter shifting on mel-warped frequency scale. The paper also describes accurate technique for formant tracking.
منابع مشابه
HMM-based visual speech recognition using intensity and location normalization
This paper describes intensity and location normalization techniques for improving the performance of visual speech recognizers used in audio-visual speech recognition. For auditory speech recognition, there exist many methods for dealing with channel characteristics and speaker individualities, e.g., CMN (cepstral mean normalization), SAT (speaker adaptive training). We present two techniques ...
متن کاملSpeaker adaptation for audio-visual speech recognition
In this paper, speaker adaptation is investigated for audiovisual automatic speech recognition (ASR) using the multistream hidden Markov model (HMM). First, audio-only and visual-only HMM parameters are adapted by combining maximum a posteriori and maximum likelihood linear regression adaptation. Subsequently, the audio-visual HMM stream exponents are adapted to better capture the reliability o...
متن کاملMulti-pose lipreading and audio-visual speech recognition
In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization b...
متن کاملSpeaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions
The article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation trai...
متن کاملGradient and Visual Speaker Normalization in the Perception of Fricatives
The role of visual information in speaker normalization of fricatives is examined by com paring listeners' responses to prototypical and non—prototypical male and female speech with their responses to audio—visual integrated stimuli (speech signal and face of speaker) in the fricatives [sj (" sod ") and [I] (" shod ") in Ohio English. Results from audio—only identification tasks suggest that th...
متن کامل